fine-grained dense feature composition
Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition
We develop a novel generative model for zero-shot learning to recognize fine-grained unseen classes without training samples. Our observation is that generating holistic features of unseen classes fails to capture every attribute needed to distinguish small differences among classes. We propose a feature composition framework that learns to extract attribute-based features from training samples and combines them to construct fine-grained features for unseen classes. Feature composition allows us to not only selectively compose features of unseen classes from only relevant training samples, but also obtain diversity among composed features via changing samples used for composition. In addition, instead of building a global feature of an unseen class, we use all attribute-based features to form a dense representation consisting of fine-grained attribute details. To recognize unseen classes, we propose a novel training scheme that uses a discriminative model to construct features that are subsequently used to train itself. Therefore, we directly train the discriminative model on composed features without learning separate generative models. We conduct experiments on four popular datasets of DeepFashion, AWA2, CUB, and SUN, showing that our method significantly improves the state of the art.
Review for NeurIPS paper: Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition
Summary and Contributions: --- Update after rebuttal --- I thank the author for their detailed rebuttal and effort to clarify the content of the paper and provide missing details. Authors have addressed most pressing concerns, and it is my opinion that their work could be of interest to the community. I would strongly recommend, however, that authors revise the presentation of their manuscript, in particular with respect to clarity/missing details and claims. Please revise/refine the use of certain terms (cf claims about generative models/self-training, see correctness section) and add all the clarifications provided in the rebuttal (in particular with regards to experimental details not provided in the main paper). The method uses the dense attribute attention method of [10] (DAZLE) to learn a set of attribute specific feature vectors, and subsequently train a classification model by iteratively updating classifier (learning from seen and generated unseen feature) and generating new unseen features using classification predictions.
Review for NeurIPS paper: Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition
Initially, this paper received diverging reviews. The reviewers found the idea interesting but had some concerns regarding clarity and the difference between the proposed method and the DAZLE baseline. The authors provided a rebuttal, clarifying the issues that were brought up by the reviews, which satisfied the reviewers. During the discussion, some reviewers have argued that the difference between DAZLE and the paper is clear, and the generated features have been demonstrated to have potential to identify new classes. All reviewers have rated the paper as positive (three "6:marginally above threshold" and one "7:accept") after the discussion phase, so overall the reviewers lean toward accepting.
Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition
We develop a novel generative model for zero-shot learning to recognize fine-grained unseen classes without training samples. Our observation is that generating holistic features of unseen classes fails to capture every attribute needed to distinguish small differences among classes. We propose a feature composition framework that learns to extract attribute-based features from training samples and combines them to construct fine-grained features for unseen classes. Feature composition allows us to not only selectively compose features of unseen classes from only relevant training samples, but also obtain diversity among composed features via changing samples used for composition. In addition, instead of building a global feature of an unseen class, we use all attribute-based features to form a dense representation consisting of fine-grained attribute details.